The data visualization tab can make customized plots for signatures and exposures predicted from LDA and NMF algorithms using ggplot2 package. An option of making interactive plot using plotly is also provided.
The signature plot is presented only using bar plot with each bar representing the probability of each type of mutation. By default, signatures are named by numbers, but an option of renaming signatures is provided if you want to name them otherwise, such as the possible etiology.
In this tutorial, we found 8 single-base signatures from the mixture of lung adenocarcinoma, lung squamous cell carcinoma, and skin cutaneous melanoma samples.
To visualize the exposure of each signature for each sample, we provided three options, including bar plot, box plot, and violin plot.
By default, a stacked bar plot sorted by the total number of mutations is used. Each stacked bar shows the proportion of exposure of each signature.
The stacked bar plot can be ordered by signatures. If Signatures is selected in Sort By option, a bucket list will show up to allow you select the signatures you want to use by dragging them from the left box to the right box. Users can also set limit on the number of samples to display.
This stacked bar plot is now ordered by the exposure of signature 1 and only top 400 samples were included here.
Box plot and violin plot can be used to visualize the distribution of exposures or compare exposures between different groups of samples.
By default, a box plot of exposure for each signature will be shown.
If an annotation file is provided, then you can visually compare the exposure of signatures among different groups.
In this box plot, exposures of each signature were grouped by tumor types. We can find that signatures 1 and 5 were highly exposed in lung cancer samples, while signatures 3 and 8 were enriched in skin cancer samples.
We can also group samples by signatures and then color by tumor types.
This plot can let you directly compare how each signatures were differentially exposed among three tumor types.
The clustering subtab provides several algorithms to cluster samples based on exposure of each signature. After selecting the musica result object, it is recommended to use Explore Number of Clusters box to find the optimal number of clusters in your data. Different clustering algorithms and three metrics including within cluster sum of squares, averaged silouette coefficient, and gap statistics are provided here for exploration. All algorithms were imported from factoextra and cluster. (Note: If gap statistic is selected, it will take much longer time to generate the plot.)
This is a connected scatter plot shows the within cluster sum of squares for each number of clusters predicted using hierarchical clustering. The “elbow” method can be used to determine the optimal number of clusters.
The Clustering box is where you perform the clustering analysis. In addition to clustering algorithm, several methods for calculating dissimilarity matrix, imported from philentropy package, are also provided.
This table is the output of clustering analysis, combined with annotation.
In the Visulaization box, users can make scatter plots to visualize the clustering results on a UMAP panel, calculated from signature exposures. Three types of plots are provided.
If Signature is selected, samples are grouped by clusters and multiplicated by the number of signatures. For each column, samples are colored by exposure of a signature.
If Annotation is selected, an additional select box will show up and let you choose one type of annotation of interest. Then, you can make a plot grouping samples by both clusters and annotation.
If None is selected, a single scatter plot, colored by clusters, will be made.